A LEARNING ALGORITHM FOR COMMUNICATING MARKOV DECISION PROCESSES WITH UNKNOWN TRANSITION MATRICES
نویسندگان
چکیده
منابع مشابه
A LEARNING ALGORITHM FOR COMMUNICATING MARKOV DECISION PROCESSES WITH UNKNOWN TRANSITION MATRICES by
This study is concerned with finite Markov decision processes (MDPs) whose state are exactly observable but its transition matrix is unknown. We develop a learning algorithm of the reward-penalty type for the communicating case of multi-chain MDPs. An adaptively optimal policy and an asymptotic sequence of adaptive policies with nearly optimal properties are constructed under the average expect...
متن کاملRobust Control of Markov Decision Processes with Uncertain Transition Matrices
Optimal solutions to Markov decision problems may be very sensitive with respect to the state transition probabilities. In many practical problems, the estimation of these probabilities is far from accurate. Hence, estimation errors are limiting factors in applying Markov decision processes to real-world problems. We consider a robust control problem for a finite-state, finite-action Markov dec...
متن کاملRobust Markov Decision Processes with Uncertain Transition Matrices
Optimal solutions to Markov decision problems may be very sensitive with respect to the state transition probabilities. In many practical problems, the estimation of these probabilities is far from accurate. Hence, estimation errors are limiting factors in applying Markov decision processes to real-world problems. We consider a robust control problem for a finite-state, finite-action Markov dec...
متن کاملLearning Unknown Markov Decision Processes: A Thompson Sampling Approach
We consider the problem of learning an unknown Markov Decision Process (MDP) that is weakly communicating in the infinite horizon setting. We propose a Thompson Sampling-based reinforcement learning algorithm with dynamic episodes (TSDE). At the beginning of each episode, the algorithm generates a sample from the posterior distribution over the unknown model parameters. It then follows the opti...
متن کاملLearning Qualitative Markov Decision Processes Learning Qualitative Markov Decision Processes
To navigate in natural environments, a robot must decide the best action to take according to its current situation and goal, a problem that can be represented as a Markov Decision Process (MDP). In general, it is assumed that a reasonable state representation and transition model can be provided by the user to the system. When dealing with complex domains, however, it is not always easy or pos...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bulletin of informatics and cybernetics
سال: 2007
ISSN: 0286-522X
DOI: 10.5109/16771